An integrated learning algorithm for rule induction
نویسنده
چکیده
This document describes the hierarchical agglomerative cluster algorithm Pnc 2 in the context of direct generation of If-Then rules for classification tasks. As an agglomerative cluster algorithm, the Pnc 2 initializes each learn data tuple as a single cluster. Then, if a merge test is passed, iteratively always those two clusters with the same output value are merged, that are closest to each other. The merge test transforms the generalized cluster into a rule and evaluates it by a kind of hitrate. The rule’s premise is the cuboid, that encloses the input vectors of all learn data tuples merged in the cluster. This representation suffers in high dimensional input spaces due to the COD problem and thus a special mechanism is used to extend the cuboid during the merge test. A heterogenous normalized and weighted Minkowski overlap metric is used to be able to process mixed continuous and nominal inputs. An integrated bagging component can improve accuracy and also reduces the time complexity for a learn data sample with N data tuples from O(N ) to approximately O(N). The size of the learned rule set can be further reduced by applying a context sensitive feature selection, that individually removes the unnecessary inputs from each rule’s premise. The algorithm can also be viewed as an instance based learning algorithm, namely as an exemplar-based generalization approach. Thus the idea of the k-nearestneighbor algorithm (knn), to base the decision on several surrounding learn data tuples, can be transferred to improve the prediction accuracy. The number of free parameters of the Pnc 2 is been reduced in a preliminary study with some development benchmarks. Then the Pnc 2 is compared experimentally with the most similar existing algorithms, namely with the Nge, the Rise and, of course, with the knn algorithm. All remaining free parameters of the Pnc 2 are tuned using cross-validation or similar approaches within the respective learn data samples. The Pnc 2 outperforms the Nge algorithms and its variants and reaches better or comparative accuracies as the knn or the RISE algorithm with typically much smaller ruleset/model sizes. Acknowledgement The Pnc 2 cluster algorithm was developed while I was a scholarship holder in the post graduate research program Modelling and Model-Based Design of Complex Technological Systems at the Chair of Electrical Control Engineering at the University of Dortmund, Germany. My research project was initiated by Prof. Dr. rer. nat. H. Kiendl. The post graduate research program was founded by the Deutsche Forschungsgemeinschaft (DFG). Note The Pnc 2 Rule Induction System is a free Windows software tool, that is using the Pnc 2 cluster algorithm to automatically induce rules from a given data sample. Additionally a DOS command line version will be available soon. The kernels are written in ANSI C++, they are well documented and should easily be compiled for different operating systems. You may download the program at http://www.newty.de/pnc2/index.html.
منابع مشابه
An integrated approach for scheduling flexible job-shop using teaching–learning-based optimization method
In this paper, teaching–learning-based optimization (TLBO) is proposed to solve flexible job shop scheduling problem (FJSP) based on the integrated approach with an objective to minimize makespan. An FJSP is an extension of basic job-shop scheduling problem. There are two sub problems in FJSP. They are routing problem and sequencing problem. If both the sub problems are solved simultaneously, t...
متن کاملMMDT: Multi-Objective Memetic Rule Learning from Decision Tree
In this article, a Multi-Objective Memetic Algorithm (MA) for rule learning is proposed. Prediction accuracy and interpretation are two measures that conflict with each other. In this approach, we consider accuracy and interpretation of rules sets. Additionally, individual classifiers face other problems such as huge sizes, high dimensionality and imbalance classes’ distribution data sets. This...
متن کاملINTEGRATED ADAPTIVE FUZZY CLUSTERING (IAFC) NEURAL NETWORKS USING FUZZY LEARNING RULES
The proposed IAFC neural networks have both stability and plasticity because theyuse a control structure similar to that of the ART-1(Adaptive Resonance Theory) neural network.The unsupervised IAFC neural network is the unsupervised neural network which uses the fuzzyleaky learning rule. This fuzzy leaky learning rule controls the updating amounts by fuzzymembership values. The supervised IAFC ...
متن کاملRelational Reinforcement Rule Induction and the Effect of Pruning
Covering Algorithm (CA) is a Machine Learning field that produces a powerful repository represented as simple if-then rules. Although this field is well established with discrete data but it has its deficiency when dealing with numeric data. This paper introduces a new algorithm called RULES-CONT, which deal with continuous attributes using Relational Reinforcement Learning (RRL). This algorith...
متن کاملMulti-objective Differential Evolution for the Flow shop Scheduling Problem with a Modified Learning Effect
This paper proposes an effective multi-objective differential evolution algorithm (MDES) to solve a permutation flow shop scheduling problem (PFSSP) with modified Dejong's learning effect. The proposed algorithm combines the basic differential evolution (DE) with local search and borrows the selection operator from NSGA-II to improve the general performance. First the problem is encoded with a...
متن کاملAn integrated vendor–buyer model with stochastic demand, lot-size dependent lead-time and learning in production
In this article, an imperfect vendor–buyer inventory system with stochastic demand, process quality control and learning in production is investigated. It is assumed that there are learning in production and investment for process quality improvement at the vendor’s end, and lot-size dependent lead-time at the buyer’s end. The lead-time for the first batch and those for the rest of the batches ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003